In [1]:
import pandas as pd
import seaborn as sns
import plotly.express as px

import matplotlib.pyplot as plt
In [2]:
import plotly.io as pio
pio.renderers.default = "plotly_mimetype+notebook"

Matplotlib¶

For this excercise, we have written the following code to load the stock dataset built into plotly express.

In [3]:
stocks = px.data.stocks()
stocks.head()
Out[3]:
date GOOG AAPL AMZN FB NFLX MSFT
0 2018-01-01 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
1 2018-01-08 1.018172 1.011943 1.061881 0.959968 1.053526 1.015988
2 2018-01-15 1.032008 1.019771 1.053240 0.970243 1.049860 1.020524
3 2018-01-22 1.066783 0.980057 1.140676 1.016858 1.307681 1.066561
4 2018-01-29 1.008773 0.917143 1.163374 1.018357 1.273537 1.040708

Question 1:¶

Select a stock and create a suitable plot for it. Make sure the plot is readable with relevant information, such as date, values.

In [4]:
goog_stocks = stocks[['date','GOOG']]
x1 = goog_stocks['date']
y1 = goog_stocks['GOOG']
fig1, ax1 = plt.subplots(figsize=(15,9))
ax1.plot(x1,y1)

# Set title
ax1.set_title('Google stock')

# Label the horizontal axis
xticks1 = range(0, stocks.shape[0], 10)
ax1.set_xticks(xticks1)
ax1.set_xlabel('Date')

# Label the vertical axis
ax1.set_ylabel('Stock Value')

plt.show()

Question 2:¶

You've already plot data from one stock. It is possible to plot multiples of them to support comparison.
To highlight different lines, customise line styles, markers, colors and include a legend to the plot.

In [5]:
fig2, ax2 = plt.subplots(figsize=(15,9))

# Extract values to use for the horizontal axis
x2 = stocks['date']

# Plot each set of y values - the stock values of each stock
ax2.plot(x2, stocks['GOOG'], label = "GOOG", linestyle='-', marker='^')
ax2.plot(x2, stocks['AAPL'], label = "AAPL", linestyle='--', marker='.')
ax2.plot(x2, stocks['AMZN'], label = "AMZN", linestyle=':', marker='>')
ax2.plot(x2, stocks['FB'], label = "FB", linestyle='-.', marker='p')
ax2.plot(x2, stocks['NFLX'], label = "NFLX", linestyle='--', marker='o')
ax2.plot(x2, stocks['MSFT'], label = "MSFT", linestyle=':', marker='*')

# Set title
ax2.set_title('Stocks')

# Label the horizontal axis
# Use the x-axis ticks created in Question 1
ax2.set_xticks(xticks1)
ax2.set_xlabel('Date')

# Label the vertical axis
ax2.set_ylabel('Stock Value')

plt.legend()
plt.show()

Seaborn¶

First, load the tips dataset

In [6]:
tips = sns.load_dataset('tips')
tips.head()
Out[6]:
total_bill tip sex smoker day time size
0 16.99 1.01 Female No Sun Dinner 2
1 10.34 1.66 Male No Sun Dinner 3
2 21.01 3.50 Male No Sun Dinner 3
3 23.68 3.31 Male No Sun Dinner 2
4 24.59 3.61 Female No Sun Dinner 4

Question 3:¶

Let's explore this dataset. Pose a question and create a plot that support drawing answers for your question.

My question is:

  • Are there differences between male and female when it comes to giving tips?
In [7]:
# Generate a box plot to show the differences between females' and males' tips received
fig3, ax3 = plt.subplots(figsize=(15,9))
sns.boxplot(x='sex', y='tip', data=tips)

plt.show()

Plotly Express¶

Question 4:¶

Redo the above exercises (challenges 2 & 3) with plotly express. Create diagrams which you can interact with.

The stocks dataset¶

Hints:

  • Turn stocks dataframe into a structure that can be picked up easily with plotly express
In [8]:
# Rearrange the stocks dataframe so to turn the columns of stock values into rows
# while keeping the dates as the row index
stocks_new = stocks.melt(id_vars=["date"], 
        var_name="stock", 
        value_name="stock value")

# Print the rearranged stocks data to see what the new dataframe looks like
print(stocks_new.head())

# Plot the stock values based on dates in a line plot
# Differentiate the lines by the stock names
fig4 = px.line(
    stocks_new, x="date", y="stock value", color="stock", 
    hover_data=['stock']
)

fig4.show()
         date stock  stock value
0  2018-01-01  GOOG     1.000000
1  2018-01-08  GOOG     1.018172
2  2018-01-15  GOOG     1.032008
3  2018-01-22  GOOG     1.066783
4  2018-01-29  GOOG     1.008773

The tips dataset¶

In [9]:
# Observe how the amount of tip changes with the total bill using a scatter plot
# Differentiate the scatters by sex
fig5 = px.scatter(
    tips, x="total_bill", y="tip", color="sex", 
    hover_data=['sex']
)
fig5.show()

Question 5:¶

Recreate the barplot below that shows the population of different continents for the year 2007.

Hints:

  • Extract the 2007 year data from the dataframe. You have to process the data accordingly
  • use plotly bar
  • Add different colors for different continents
  • Sort the order of the continent for the visualisation. Use axis layout setting
  • Add text to each bar that represents the population
In [10]:
#Load data
df = px.data.gapminder()
df.head()
Out[10]:
country continent year lifeExp pop gdpPercap iso_alpha iso_num
0 Afghanistan Asia 1952 28.801 8425333 779.445314 AFG 4
1 Afghanistan Asia 1957 30.332 9240934 820.853030 AFG 4
2 Afghanistan Asia 1962 31.997 10267083 853.100710 AFG 4
3 Afghanistan Asia 1967 34.020 11537966 836.197138 AFG 4
4 Afghanistan Asia 1972 36.088 13079460 739.981106 AFG 4
In [11]:
# Group by the population data by continent
# Calculate the sum of each continent's population
df_2007 = df.query('year==2007')
df_2007_new = df_2007.groupby(df_2007['continent']).sum()
print(df_2007_new.head())

# Plot the population data of continents in an ascending order in a barplot
# Differentiate the bars by continent
fig6 = px.bar(df_2007_new, x="pop", y=df_2007_new.index, color=df_2007_new.index, text="pop", orientation='h')
fig6.update_yaxes(categoryorder="max descending")
fig6.show()
             year   lifeExp         pop      gdpPercap  iso_num
continent                                                      
Africa     104364  2849.914   929539692  160629.695446    23859
Americas    50175  1840.203   898871184  275075.790634     9843
Asia        66231  2334.040  3811953827  411609.886714    13354
Europe      60210  2329.458   586098529  751634.449078    12829
Oceania      4014   161.439    24549947   59620.376550      590